HEAD
need to install the aws.s3 package.
install.packages("aws.s3")
Load Library
library(data.table)
data.table 1.12.8 using 2 threads (see ?getDTthreads). Latest news: r-datatable.com
Here are , myaccesskey : AKIASLJSTH7SVRFIDUGC mysecretkey : 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
In order to connect to S3, need to authenticate. set environment variables like this
bucketlist()
Error in bucketlist() : could not find function "bucketlist"
#bucketlist()
Two errors I stumbled upon here were
Forbidden (HTTP 403) because I inserted the wrong credentials for my user Forbidden (HTTP 403) because I incorrectly set up my user’s permission to access S3
Alternatively, you can pass your access key and secret access key as parameters to aws.s3 functions directly. For example, you can make calls like this.
Let’s start by uploading a couple CSV files to S3.
# Save customer segmentation datasets to CSV files in tempdir()
write.csv(customersegmentation_data, file.path(tempdir(), "data.csv"))
# Upload files to S3 bucket
put_object(
file = file.path(tempdir(), "data.csv"),
object = "data.csv",
bucket = "customersegmentation_data"
)
If you get an error like 301 Moved Permanently, it most likely means that something’s gone wrong with regards to your region. It could be that
You’ve misspelled or inserted the wrong region name for the environment variable AWS_DEFAULT_REGION (if you’re using environment vars) You’ve misspelled or inserted the wrong region name for the region parameter of put_object() (if you aren’t using environment vars) You’ve incorrectly set up your user’s permissions These files are pretty small, so uploading them is a breeze. If you’re trying to upload a big file (> 100MB), you may want to set multipart = TRUE within the put_object() function.
Now let’s list all the objects in our bucket using get_bucket().
get_bucket(bucket = "customersegmentationdata")
Bucket: customersegmentationdata
$Contents
Key: data.csv
LastModified: 2020-12-13T13:23:49.000Z
ETag: "7bd2101a21bda2d6ea265d1981d2be87-3"
Size (B): 45580638
Owner: babarneeta38
Storage class: STANDARD
This returns a list of s3_objects. I like using the rbindlist() function from the data.table package to collapse these objects into a nice and clean data.table (i.e. data.frame).
#data.table::rbindlist(get_bucket(bucket = "customersegentationdata"))
We can load one of these CSV files from S3 into R with the s3read_using() function. Here are three different ways to do this..
Note that the second and third examples use the object’s URI which is given as s3://
tempfile <- tempfile() # temp filepath like
save_object(object = "s3://customersegmentationdata/data.csv", file = tempfile)
[1] "C:\\Users\\Irshad\\AppData\\Local\\Temp\\RtmpqKmPy8\\file2830255a24b7"
read.csv(tempfile)
NA
typeof(df)
[1] "list"
Get the structure and summary of the data frame.
typeof(df)
[1] "list"
str(df)
'data.frame': 541909 obs. of 8 variables:
$ InvoiceNo : Factor w/ 25900 levels "536365","536366",..: 1 1 1 1 1 1 1 2 2 3 ...
$ StockCode : Factor w/ 4070 levels "10002","10080",..: 3538 2795 3045 2986 2985 1663 801 1548 1547 3306 ...
$ Description: Factor w/ 4224 levels ""," 4 PURPLE FLOCK DINNER CANDLES",..: 4027 4035 932 1959 2980 3235 1573 1698 1695 259 ...
$ Quantity : int 6 6 8 6 6 2 6 6 6 32 ...
$ InvoiceDate: Factor w/ 23260 levels "1/10/2011 10:04",..: 6839 6839 6839 6839 6839 6839 6839 6840 6840 6841 ...
$ UnitPrice : num 2.55 3.39 2.75 3.39 3.39 7.65 4.25 1.85 1.85 1.69 ...
$ CustomerID : int 17850 17850 17850 17850 17850 17850 17850 17850 17850 13047 ...
$ Country : Factor w/ 38 levels "Australia","Austria",..: 36 36 36 36 36 36 36 36 36 36 ...
summary(df)
InvoiceNo StockCode Description
573585 : 1114 85123A : 2313 WHITE HANGING HEART T-LIGHT HOLDER: 2369
581219 : 749 22423 : 2203 REGENCY CAKESTAND 3 TIER : 2200
581492 : 731 85099B : 2159 JUMBO BAG RED RETROSPOT : 2159
580729 : 721 47566 : 1727 PARTY BUNTING : 1727
558475 : 705 20725 : 1639 LUNCH BAG RED RETROSPOT : 1638
579777 : 687 84879 : 1502 ASSORTED COLOUR BIRD ORNAMENT : 1501
(Other):537202 (Other):530366 (Other) :530315
Quantity InvoiceDate UnitPrice CustomerID
Min. :-80995.00 10/31/2011 14:41: 1114 Min. :-11062.06 Min. :12346
1st Qu.: 1.00 12/8/2011 9:28 : 749 1st Qu.: 1.25 1st Qu.:13953
Median : 3.00 12/9/2011 10:03 : 731 Median : 2.08 Median :15152
Mean : 9.55 12/5/2011 17:24 : 721 Mean : 4.61 Mean :15288
3rd Qu.: 10.00 6/29/2011 15:58 : 705 3rd Qu.: 4.13 3rd Qu.:16791
Max. : 80995.00 11/30/2011 15:13: 687 Max. : 38970.00 Max. :18287
(Other) :537202 NA's :135080
Country
United Kingdom:495478
Germany : 9495
France : 8557
EIRE : 8196
Spain : 2533
Netherlands : 2371
(Other) : 15279
Quantity is greater than 10
need to install the aws.s3 package.
install.packages("aws.s3")
Load Library
library(data.table)
data.table 1.12.8 using 2 threads (see ?getDTthreads). Latest news: r-datatable.com
Here are , myaccesskey : AKIASLJSTH7SVRFIDUGC mysecretkey : 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
In order to connect to S3, need to authenticate. set environment variables like this
bucketlist()
Error in bucketlist() : could not find function "bucketlist"
#bucketlist()
Two errors I stumbled upon here were
Forbidden (HTTP 403) because I inserted the wrong credentials for my user Forbidden (HTTP 403) because I incorrectly set up my user’s permission to access S3
Alternatively, you can pass your access key and secret access key as parameters to aws.s3 functions directly. For example, you can make calls like this.
Let’s start by uploading a couple CSV files to S3.
# Save customer segmentation datasets to CSV files in tempdir()
write.csv(customersegmentation_data, file.path(tempdir(), "data.csv"))
# Upload files to S3 bucket
put_object(
file = file.path(tempdir(), "data.csv"),
object = "data.csv",
bucket = "customersegmentation_data"
)
If you get an error like 301 Moved Permanently, it most likely means that something’s gone wrong with regards to your region. It could be that
You’ve misspelled or inserted the wrong region name for the environment variable AWS_DEFAULT_REGION (if you’re using environment vars) You’ve misspelled or inserted the wrong region name for the region parameter of put_object() (if you aren’t using environment vars) You’ve incorrectly set up your user’s permissions These files are pretty small, so uploading them is a breeze. If you’re trying to upload a big file (> 100MB), you may want to set multipart = TRUE within the put_object() function.
Now let’s list all the objects in our bucket using get_bucket().
get_bucket(bucket = "customersegmentationdata")
Bucket: customersegmentationdata
$Contents
Key: data.csv
LastModified: 2020-12-13T13:23:49.000Z
ETag: "7bd2101a21bda2d6ea265d1981d2be87-3"
Size (B): 45580638
Owner: babarneeta38
Storage class: STANDARD
This returns a list of s3_objects. I like using the rbindlist() function from the data.table package to collapse these objects into a nice and clean data.table (i.e. data.frame).
#data.table::rbindlist(get_bucket(bucket = "customersegentationdata"))
We can load one of these CSV files from S3 into R with the s3read_using() function. Here are three different ways to do this..
Note that the second and third examples use the object’s URI which is given as s3://
tempfile <- tempfile() # temp filepath like
save_object(object = "s3://customersegmentationdata/data.csv", file = tempfile)
[1] "C:\\Users\\Irshad\\AppData\\Local\\Temp\\RtmpqKmPy8\\file2830255a24b7"
read.csv(tempfile)
NA
typeof(df)
[1] "list"
Get the structure and summary of the data frame.
typeof(df)
[1] "list"
str(df)
'data.frame': 541909 obs. of 8 variables:
$ InvoiceNo : Factor w/ 25900 levels "536365","536366",..: 1 1 1 1 1 1 1 2 2 3 ...
$ StockCode : Factor w/ 4070 levels "10002","10080",..: 3538 2795 3045 2986 2985 1663 801 1548 1547 3306 ...
$ Description: Factor w/ 4224 levels ""," 4 PURPLE FLOCK DINNER CANDLES",..: 4027 4035 932 1959 2980 3235 1573 1698 1695 259 ...
$ Quantity : int 6 6 8 6 6 2 6 6 6 32 ...
$ InvoiceDate: Factor w/ 23260 levels "1/10/2011 10:04",..: 6839 6839 6839 6839 6839 6839 6839 6840 6840 6841 ...
$ UnitPrice : num 2.55 3.39 2.75 3.39 3.39 7.65 4.25 1.85 1.85 1.69 ...
$ CustomerID : int 17850 17850 17850 17850 17850 17850 17850 17850 17850 13047 ...
$ Country : Factor w/ 38 levels "Australia","Austria",..: 36 36 36 36 36 36 36 36 36 36 ...
summary(df)
InvoiceNo StockCode Description
573585 : 1114 85123A : 2313 WHITE HANGING HEART T-LIGHT HOLDER: 2369
581219 : 749 22423 : 2203 REGENCY CAKESTAND 3 TIER : 2200
581492 : 731 85099B : 2159 JUMBO BAG RED RETROSPOT : 2159
580729 : 721 47566 : 1727 PARTY BUNTING : 1727
558475 : 705 20725 : 1639 LUNCH BAG RED RETROSPOT : 1638
579777 : 687 84879 : 1502 ASSORTED COLOUR BIRD ORNAMENT : 1501
(Other):537202 (Other):530366 (Other) :530315
Quantity InvoiceDate UnitPrice CustomerID
Min. :-80995.00 10/31/2011 14:41: 1114 Min. :-11062.06 Min. :12346
1st Qu.: 1.00 12/8/2011 9:28 : 749 1st Qu.: 1.25 1st Qu.:13953
Median : 3.00 12/9/2011 10:03 : 731 Median : 2.08 Median :15152
Mean : 9.55 12/5/2011 17:24 : 721 Mean : 4.61 Mean :15288
3rd Qu.: 10.00 6/29/2011 15:58 : 705 3rd Qu.: 4.13 3rd Qu.:16791
Max. : 80995.00 11/30/2011 15:13: 687 Max. : 38970.00 Max. :18287
(Other) :537202 NA's :135080
Country
United Kingdom:495478
Germany : 9495
France : 8557
EIRE : 8196
Spain : 2533
Netherlands : 2371
(Other) : 15279
Quantity is greater than 10